SCIENTIFIC 

REPORTS 




OPEN 



SUBJECT AREAS: 

GENOME- WIDE 
ASSOCIATION STUDIES 

STATISTICAL METHODS 



Received 
27 March 2014 

Accepted 
3 July 2014 

Published 
21 July 2014 



Correspondence and 
requests for materials 
should be addressed to 
Y.S. (yshen@c2b2. 
columbla.edu) 



* Current address: 
Program in 
Personalized & 
Genomic Medicine 
and Department of 
Medicine, University of 
Maryland, Baltimore, 
MD. 



Estimating heritability of drug-induced 
liver injury from common variants and 
implications for future study designs 



Casey Lynnette Overby' *, George Hripcsak' & Yufeng Shen' 



'Department of Biomedical Informatics, Columbia University, New York, NY, ^Department of Systems Biology, Columbia University, 
New York, NY, ^JP Sulzberger Colu mbio Genome Center, Columbia University, New York, NY. 



Recent genome-wide association studies identified certain human leukocyote antigen (HLA) alleles as the 
major risk factors of drug-induced liver injuries (DILI). While these alleles often cause large relative risk, 
their predictive values are quite low due to low prevalence of idiosyncratic DILI. Finding additional risk 
factors is important for precision medicine. However, optimal design of further genetic studies is hindered 
by uncertain overall heritability of DILI. This is a common problem for low-prevalence pharmacological 
traits, since it is difficult to obtain clinical outcome data in families. Here we estimated the heritability (h^ of 
DILI from case-control genome-wide single nucleotide polymorphism data using a method based on 
random effect models. We estimated the proportion of captured by common SNPs for DILI to be between 
0.3 and 0.5. For co-amoxidav induced DILI, chromosome 6 explained part of the heritability, indicating 
additional contributions from common variants yet to be found. We performed simulations to assess the 
robustness of the estimate with limited sample size under low prevelance, a condition typical to studies on 
idiosyncratic pharmacological traits. Our findings suggest that common variants outside of HLA contribute 
to DILI susceptabUity; therefore, it is valuable to conduct further GWAS with expanded case collection. 



Given the relatively small sample sizes obtainable in pharmacogenetic studies', consortia are forming with 
the goal of improving power to detect associations by pooling data generated from multiple studies. One 
effort relevant to this work is the International Serious Adverse Event Consortium (iSAEC). Phase 1 of 
the project has resulted in one of the largest DILI research collections in the world and initial studies have 
successfully identified genetic variants associated with DILP. 

While there are some successes identifying major risk factors of DILI (e.g., HLA-B*5701 as a major risk factor 
of liver injury due to flucloxacUlin), no SNPs were identified in DILI across all drugs, nor have any been validated 
when flucloxacillin and coamoxiclav are excluded in this collection. This is likely due to limited power of detecting 
modest SNP effects with the relatively smaU cohort size. Understanding the heritability of these conditions is one 
step toward understanding genetic architecture of pharmacogenomic risk, and may provide valuable information 
for designing future studies. 

Genome-wide association studies (GWAS) usually assume that common diseases are attributable to allelic 
variants present in more than 1-5% of the population^. However, most variants identified so far in GWAS 
analyses explain relatively small increases in risk. Also, in complex conditions where heritability has been 
estimated through twin and family studies, GWAS have failed to produce findings to support estimations. For 
example, the heritability of height is estimated to be 80%, however only 4% can be explained by variants identified 
in GWAS studies'". This discrepancy has been described as the missing heritability problem\ Proposed explana- 
tions for this issue are that rare variants with larger effects, gene-gene interactions, and/or gene-environment 
interactions are poorly detected by GWAS approaches'". Another explanation for the missing heritability is that 
common variants of smaller effect have not yet been identified. Peter Visscher's group has developed the 
Genome-wide Complex Trait Analysis (GCTA) method to measure the effect of common variants using a mixed 
model with genome-wide SNP data'. Applying this approach has provided further evidence that a substantial 
proportion of heritability is captured by common SNPs in complex traits including height and body mass index 
(BMI). 

This approach is particularly attractive to apply in the context of pharmacogenomics studies, where ascer- 
tainment of family data is difficult or even impossible. Given the implicit assumption that adverse drug reactions 
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Table 1 [ Heritability estimates from iSAEC genome 


wide data for 


co-amoxiclav and flucloxacilin induced drug- 


induced liver injury 


Drug (population) 


Prevalence 


No. cases 


1 

No. controls 


Estimated Ir^atl (SE) 


Estimated h^chrs (SE) 


cstimatea n MOchr6 w^] 


co-amoxiclav 


0.0005 


201 


532 


0.40 (0.16) 


0.17(0.041) 


0.38 (0.17) 


flucloxacillin 


0.0005 


77 


288 


0.48 (0.32) 


0.48 (0.076) 


0.18 (0.40) 


co-amoxiclav 


0.0001 


201 


532 


0.26 (0.10) 


0.1 1 (0.027) 


0.25 (0.1 1) 


flucloxacillin 


0.0001 


77 


288 


0.31 (0.21) 


0.31 (0.05) 


0.12 (0.26) 



Table 2 | Heritability estimates from WTCCC genome-wide 


data for Tl D 






Tl D Prevalence No. cases No. controls 


Estimated h^all (SE) 


Estimated h^chrs (SE) 


Estimated h^MOchrd (SE) 


0.0005 1963 2938 
0.008 1963 2938 


0.26 (0.022) 
0.434 (0.037) 


0.068 (0.008) 
0.1 1 (0.014) 


0.22 (0.022) 
0.37(0.037) 



(ADRs) are subject to substantial genetic control in pharmacoge- 
nomics studies, it is important to present evidence of the size of 
the genetic component (or heritability) for the trait under 
investigation. 

Heritability estimates for complex traits are typically determined 
through twin studies. However, these approaches have limited utility 
in the context of susceptibility to ADRs". Limitations are primarily 
due to difficulties recruiting and obtaining clinical outcome data in 
twins. This limitation might be overcome by applying the GCTA 
approach that estimates the effect of common variants from gen- 
ome-wide SNP data. In this study we assessed the performance of this 
approach with DILI GWAS datasets. 

Results 

Proportion of heritability captured by genome-wide common 
variants for drug-induced liver injury. We estimated the 
proportion of captured by all genome-wide common SNPs, 
chromosome 6 genome-wide SNPs and genome-wide SNPs 
without chromosome 6 for DILI. Results for liver injuries induced 
by co-amoxiclav and flucloxacillin are summarized in Table 1. We 
observe that for flucloxacillin induced DILI patients, chromosome 6 
explained almost all of the heritability (ft^au ~ 0-48 ± 0.313, ft^chre ~ 
0.478 ± 0.076, ?!^NOchr6 = 0.18 ± 0.40). Whereas with co-amoxiclav 
induced DILI patients, chromosome 6 explained part of the 
heritability (ft\u = 0.40 ± 0.16, W^hvb = 0.17 ± 0.041, ft^NOchre = 
0.378 ± 0.17), indicating contributions from additional common 
variants are yet to be found. Estimates for W may be 
underestimated due to GCTA algorithm constraints so that SNP- 
heritabUity on the observed scale could not be greater than 1 . Even so, 
this finding suggests that continuous collection of DILI cases is 
valuable for the potential of discovering additional associations 
with common genetic variants. 

GCTA algorithm estimation of heritabUity for moderate sample 
sizes. The heritability of TID is estimated from pedigree studies to be 
0.9'''°. To date, the proportion of phenotypic variance explained by 
validated SNPs and genome-wide significant variants (including pre- 
GWAS loci with large effects) is 0.6' ". In addition, the proportion of 
phenotypic variance explained when all GWAS SNPs are considered 
simultaneously is 0.3'''^. With the GCTA algorithm, we estimated the 
proportion of phenotypic variance explained by all genome-wide 
SNPs, chromosome 6 genome-wide SNPs, and genome-wide SNPs 



without chromosome 6. Results of data for all WTCCC data are 
summarized in Table 2. Results of data for 75 and 200 cases are 
summarized in Table 3. Our simulations to assess of the impact of 
sample sizes on heritability estimates indicated relatively stable 
estimates with moderate sample sizes. We estimate the proportion 
of W captured by common SNPs for TID to be, on average, 0.51 ± 
0.23 with a sample size of 75 and 0.46 ±0.15 with a sample size of 
200. These estimates are similar to estimations made with the entire 
dataset of 1963 cases and 2938 controls (h\ii=0.44 ± 0.037). 
Moreover, estimates calculated from sample sizes of 75 and 200 
cases gave similar results, with both indicating that chromosome 6 
explains part of the heritability (the Wchre estimate is 0.19 ± 0.15 with 
75 cases and 0.12 ± 0.058 with 200 cases). For a sample size of 75, 
however, the standard deviation for our h'^chre estimate is close to the 
mean. This indicates that calculations are not stable with smaller 
sample sizes, underscoring the uncertainty about flucloxacUlin- 
induced DILI. Estimates for h^chre calculated using this particular 
TID dataset may also be underestimated due to inadequate SNP 
density in the MHC region in this particular data set. 

Analysis of measurement errors in the drug-induced liver injury 
dataset. Lastly, we observe that the estimated h' is 0 for both 
flucloxacillin induced and co-amoxiclav induced DILI controls 
coded as cases. Results are summarized in Table 4. We conclude 
that there were no substantial measurement errors in our datasets. 

Discussion 

WhUe further investigation is required to confirm the robustness of 
the GCTA algorithm for low prevalence traits such as DILI, this work 
highlights the potential value of its application. Here we were able to 
apply the algorithm to investigate the contribution of common var- 
iants on the chromosome level. Such investigations can provide 
insight into disease mechanism and into inter-individual variation. 
Moreover, for ADRs in particular, we are able to provide previously 
unobtainable estimates of heritability. Such estimates will help 
optimize designs of future studies for identifying additional genetic 
contributions to these conditions. 

Findings suggest that collecting and genotyping more co-amoxi- 
clav induced DILI cases is valuable for discovering additional asso- 
ciations. While further investigation is required to confirm the 
robustness of the GCTA algorithm for low prevalence traits like 
DILI, this work highlights the potential value of its application. 



Table 3 1 Heritability estimation from WTCCC genome-wide 


data for Tl D with limited 


number of samples 




Tl D Prevalence No. Cases No. Controls 


Estimated h^all (SD) 


EsKmated h^^hr6 (SD) 


Estimated h^NOchrd (SD) 


0.008 75 225 
0.008 200 600 


0.51 (0.23) 
0.46 (0.18) 


0.19 (0.15) 
0.12 (0.058) 


0.28 (0.31) 
0.35 (0.16) 
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tII Jill 'il 'I'l i' 1* 1 

lable 4 | Hentabihty estimation to 


assess measurement errors 






DILI indicated drug (population) 


No. cases 


No. controls 


Estimated h^all (SE) 


flucloxacillin 


288 (controls coded as cases) 


864 (WTCCC controls) 


0.0 (0.1 1) 


co-omoxiclav 


306 (controls coded as cases) 


918 (WTCCC controls) 


0.0 (0.1 1) 



Particularly for pharmacological traits, we can provide previously 
unobtainable estimates of heritability. Such estimates will help 
optimize designs for future studies. 

Methods 

Study population. DILI datasets were from case-control studies of individuals taking 
flucloxacillin (77 cases and 288 population controls)^ and individuals taking co- 
amoxiclav (201 cases and 532 population controls)'^. The genotyping data were 
generated using the lUumina IM-duo described previously^' 

Genome- wide complex trait analysis algorithm. The GCTA algorithm measures the 
effect of common variants with genome- wide SNP data using a mixed model 
approach^. The algorithm involves first estimating the genetic relationship matrix 
(GRM) of all individuals, then fitting the GRM in a mixed linear model (MLM) for 
binary traits to estimate the proportion of variance explained by all the autosomal 
SNPs. We used GCTA algorithm extended for case-control designs'^ to estimate 
captured by common SNPs for DILI. We also assessed the robustness of the GCTA 
algorithm with datasets of moderate sample sizes and low prevalence. 

Estimating proportion of heritability captured by common variants. We used the 
GCTA algorithm to estimate from iSAEC DILI genome-wide data (all 
chromosomes), chromosome 6 data, and genome-wide data without chromosome 6. 
We also estimated for all individuals (co-amoxiclav or flucloxacillin induced DILI), 
for individuals with co-amoxiclav induced DILI, and for individuals with 
flucloxacillin induced DILL For individuals with co-amoxiclav induced DILI we 
evaluated northwest Europeans only and southern Europeans both together and 
separately. All analyses were performed separately. For each population and dataset, 
unless otherwise noted, we calculated adjusting for prediction errors due to global 
structure and local structure. We used principal components analysis (PCA) to adjust 
for prediction errors due to global structure. Two principal components were 
included as regression covariates in the mixed model. We also used an optional 
functionality of the GCTA tool to adjust for prediction errors due to imperfect linkage 
disequilibrium. AU analyses were performed for DILI prevalence estimated to be 
O.OOOP'^^ and 0.0005. We reported main results using a prevalence of 0.0005 based on 
recent work (the estimated rate is 0.00043 in Iceland population^^) and a general trend 
of underreporting of DILI incidences'^. 

Assessing the robustness of the GCTA algorithm with moderate sample sizes. To 

evaluate the robustness of the GCTA algorithm with datasets of moderate sample 
sizes, we conducted a positive control and negative control experiment. We 
performed simulations for our positive control experiment. Simulations involved 
estimating for subsets of cases and controls from a Type I Diabetes (TlD) dataset. 
We choose to use a TlD because the histocompatibility complex (MHC) region on 
chromosome 6 has major genetic contribution to risks of both DILI^ and TlD^'-'^. 
Specifically, we used the Welcome Trust Case Control Consortium (WTCCC) TlD 
dataset (1963 cases and 2938 controls)'^. Genotyping was conducted for the WTCCC 
study using an Illumina 550K chip. We estimated from genome-wide data (all 
chromosomes), chromosome 6 data, and genome- wide data without chromosome 6 
for cases and controls with a 1 : 3 ratio, where the number of cases were 75 and 200 to 
reflect sample sizes similar to iSAEC DILI sample sizes. Estimates for were averaged 
over twenty random selections of cases and controls for these sample sizes. The 
prevalence of TlD is estimated to be 0.008'^. We report the mean and standard 
deviation of the twenty estimates. We also report estimates and standard errors 
with the full WTCCC TlD dataset and the GCTA parameter for prevalence set at 
0.0005 and 0.008. Given the WTCCC data were collected from a homogenous 
population, however, we do not adjust for prediction errors due to global structure. 
For a negative control experiment we tested for measurement errors by estimating 
for DILI controls coded as cases with WTCCC controls. This analysis was per- 
formed with northwestern European controls from the flucloxacillinin induced and 
co-amoxiclav induced DILI datasets. The prevalence of DILI was set to be 0.0005. 
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